Automatic font generation without human experts is a practical and significant problem, especially for some languages that consist of a large number of characters. Existing methods for font generation are often in supervised learning. They require a large number of paired data, which are labor-intensive and expensive to collect. In contrast, common unsupervised image-to-image translation methods are not applicable to font generation, as they often define style as the set of textures and colors. In this work, we propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++). We introduce a feature deformation skip connection (FDSC) to learn local patterns and geometric transformations between fonts. The FDSC predicts pairs of displacement maps and employs the predicted maps to apply deformable convolution to the low-level content feature maps. The outputs of FDSC are fed into a mixer to generate final results. Moreover, we introduce contrastive self-supervised learning to learn a robust style representation for fonts by understanding the similarity and dissimilarities of fonts. To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently. In addition to adversarial loss, another two reconstruction losses are adopted to constrain the domain-invariant characteristics between generated images and content images. Taking advantage of FDSC and the adopted loss functions, our model is able to maintain spatial information and generates high-quality character images in an unsupervised manner. Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
translated by 谷歌翻译
用于解决具有量化消息传递的实际边缘计算系统中的一般机器学习(ML)问题的联邦学习(FL)算法的最佳设计仍然是一个打开问题。本文考虑了服务器和工人在发送消息之前具有不同的计算和通信能力以及使用量化的优势计算系统。为了探讨这种优势计算系统中的FL的全部潜力,我们首先介绍一般的FL算法,即GenQSGD,由全局和局部迭代,迷你批量大小和步骤尺寸序列参数化。然后,我们分析其对任意步长序列的融合,并指定三个常用的步大规则下的收敛结果,即常数,指数和递减的步长规则。接下来,我们优化算法参数,以最小化时间约束和收敛误差约束下的能量成本,重点是FL的整体实施过程。具体地,对于在每个考虑的步长规则下的任何给定的步骤尺寸序列,我们优化全局和本地迭代和迷你批量大小的数量,以最佳地实现具有预设步长序列的应用程序的FL。我们还优化了步骤序列以及这些算法参数,以探索FL的全部潜力。由此产生的优化问题是具有非可分性约束函数的非凸面问题。我们提出了使用通用内近似(GIA)的迭代算法来获得KKT点和用于解决互补几何编程(CGP)的技巧。最后,我们用现有的FL算法用优化的算法参数进行了数值展示了GenQSGD的显着收益,并揭示了最佳地设计了一般FL算法的重要性。
translated by 谷歌翻译
用于联合学习(FL)的最佳算法设计仍然是一个打开的问题。本文探讨了实用边缘计算系统中FL的全部潜力,其中工人可能具有不同的计算和通信功能,并且在服务器和工人之间发送量化的中间模型更新。首先,我们介绍了FL,即GenQSGD的一般量化并行迷你批量随机梯度下降(SGD)算法,即GenQSGD,其由全球迭代的数量参数化,所有工人的本地迭代的数量以及迷你批量大小。我们还分析了其算法参数的任何选择的收敛误差。然后,我们优化算法参数,以最小化时间约束和收敛误差约束下的能量成本。优化问题是具有非可分辨率约束函数的具有挑战性的非凸面问题。我们提出了一种迭代算法,可以使用高级优化技术获得KKT点。数值结果证明了现有的GenQSGD的显着增益,并揭示了最佳设计的重要性FL算法。
translated by 谷歌翻译
联邦学习(FL)已成为一个热门研究领域,以在拥有敏感本地数据的多个客户中对机器学习模型进行协作培训。然而,主要使用随机梯度下降(SGD)研究了不受约束的联邦优化,该梯度下降可能会缓慢收敛,并且限制了联邦优化的优化,这更具挑战性,迄今尚未研究。本文分别研究了基于样本和基于特征的联合优化,并考虑了每个人的无限制和约束非凸问题。首先,我们建议使用随机连续的凸近似(SSCA)和迷你批次技术提出FL算法。这些算法可以充分利用目标和约束函数的结构,并逐步利用样品。我们表明,所提出的FL算法分别收敛到固定点和相应不受约束和约束的非凸问题的固定点和Karush-Kuhn-Tucker(KKT)点。接下来,我们提供算法示例,每回合具有吸引人的计算复杂性和通信负载。我们表明,未约束的联邦优化算法示例与动量SGD相同,与FL算法相同,并在SSCA和动量SGD之间提供分析连接。最后,数值实验证明了在收敛速度,通信和计算成本以及模型规范中提出算法的固有优势。
translated by 谷歌翻译
Recent work reported the label alignment property in a supervised learning setting: the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Inspired by this observation, we derive a regularization method for unsupervised domain adaptation. Instead of regularizing representation learning as done by popular domain adaptation methods, we regularize the classifier so that the target domain predictions can to some extent ``align" with the top singular vectors of the unsupervised data matrix from the target domain. In a linear regression setting, we theoretically justify the label alignment property and characterize the optimality of the solution of our regularization by bounding its distance to the optimal solution. We conduct experiments to show that our method can work well on the label shift problems, where classic domain adaptation methods are known to fail. We also report mild improvement over domain adaptation baselines on a set of commonly seen MNIST-USPS domain adaptation tasks and on cross-lingual sentiment analysis tasks.
translated by 谷歌翻译
将现有的规避风险的方法用于现实世界应用程序仍然具有挑战性。原因是多重的,包括缺乏全球最佳保证以及从长期连续轨迹中学习的必要性。长期连续的轨迹容易涉及来访的危险状态,这在规避风险的环境中是一个主要问题。本文提出了短期波动率控制的政策搜索(Stop),这是一种新型算法,通过从短期轨迹而不是长期轨迹中学习来解决规避风险问题的算法。短期轨迹更加灵活,可以避免危险的国有探访的危险。通过使用具有过度参数化的两层神经网络的参与者 - 批评方案,我们的算法以近端政策优化和自然政策梯度以统一的速率找到了全球最佳政策,其有效性可与最先进的交通率相当。风险中立的政策搜索方法。该算法对在均值方差评估指标下的具有挑战性的Mujoco机器人仿真任务进行了评估。理论分析和实验结果都表明,在现有的规避风险的策略搜索方法中,停止的最新水平。
translated by 谷歌翻译
Softmax政策的政策梯度(PG)估计与子最佳饱和初始化无效,当密度集中在次良动作时发生。从策略初始化或策略已经收敛后发生的环境的突然变化可能会出现次优策略饱和度,并且SoftMax PG估计器需要大量更新以恢复有效的策略。这种严重问题导致高样本低效率和对新情况的适应性差。为缓解此问题,我们提出了一种新的政策梯度估计,用于软MAX策略,该估计在批评中利用批评中的偏差和奖励信号中存在的噪声来逃避策略参数空间的饱和区域。我们对匪徒和古典MDP基准测试任务进行了分析和实验,表明我们的估算变得更加坚固,以便对政策饱和度更加强大。
translated by 谷歌翻译
优先经验重播(ER)已被经验证明可以提高许多领域的样本效率,并引起了极大的关注。但是,几乎没有理论上的理解,为什么这种优先的抽样有助于其局限性。在这项工作中,我们深入研究了优先的ER。在有监督的学习环境中,我们显示了基于错误的优先采样方法,用于平方误差和均匀采样,用于立方功率损失。然后,我们提供理论上的见解,说明为什么在早期学习过程中均匀抽样时它会提高收敛速度。基于洞察力,我们进一步指出了优先ER方法的两个局限性:1)过时的优先级和2)样品空间的覆盖范围不足。为了减轻局限性,我们提出了基于模型的随机梯度Langevin动力学采样方法。我们表明,我们的方法确实提供了分布的状态,该状态接近通过Brute-Force方法估计的理想优先采样分布,该分布没有两个局限性。我们对离散和连续控制问题进行实验,以显示我们的方法的功效,并检查我们方法在自主驾驶应用中的实际含义。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译